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UNDER TITLE VI OF THE NATIONAL DEFENSE EDUCATION ACT THE 
MODERN LANGUAGE ASSOCIATION OF AMERICA (MLA) UNDERTOOK IN 
1960 TO PREPARE A SERIES OF TESTS IN FRENCH, GERMAN, ITALIAN, 
RUSSIAN. AND SPANISH FOR USE OF TWO LEVELS-- (1) AFTER THE 2D 
YEAR OF HIGH SCHOOL LANGUAGE STUDY, OR THE SECOND SEMESTER IN 
COLLEGE, AND (2) AFTER THE 4 TH YEAR OF HIGH SCHOOL LANGUAGE 
STUDY OR THE FOURTH SEMESTER IN COLLEGE. SUPERVISION OF THE 
PROJECT WAS ENTRUSTED TO OUTSTANDING FOREIGN LANGUAGE 
TEACHERS, WHO WORKED INITIALLY IN COOPERATION WITH THE 
EDUCATIONAL TESTING SERVICE CETS) IN PRINCETON. THE 20 
COMMITTEES FORMED PRODUCED A BATTERY OF TESTS, PRE-PRETESTED 
AS WELL AS PRETESTED, WHICH MEASURE LISTENING, SPEAKING, 
READING, AND WRITING PROFICIENCY IN EACH OF THE FIVE 
LANGUAGES . ALTHOUGH THE SCORING SERVICES OF THE MLA-ETS ARE 
AVAILABLE, DETAILED INSTRUCTIONS FOR SCORING ACCOMPANY EACH 
TEST IN ORDER TO ELIMINATE SUCH EXPENSE. THE RESULTS SUGGEST 
THAT--(l) THE TESTS ARE OF SUITABLY GREATER THAN MIDDLE 
DIFFICULTY, (2) THE PROGRESSION FROM LEVEL TO LEVEL IS WELL 
PLANNED, (3) STUDENTS IN TRADITIONAL COURSES SHOW SOME, BUT 
NOT MARKED, SUPERIORITY IN READING AND WRITING, AND (4) WHILE 
STUDENTS WITH 2 YEARS OF A LANGUAGE IN HIGH SCHOOL DO ABOUT 
AS WELL AS THOSE WITH 1 YEAR OF COLLEGE, THOSE WITH 4 YEARS 
IN HIGH SCHOOL DO SLIGHTLY BETTER THAN THOSE WITH 2 YEARS IN 
COLLEGE. THIS ARTICLE WAS PUBLISHED IN "THE DFL BULLETIN," 
VOLUME 6, NUMBER 2, DECEMBER I960. (Gj) 
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THE MLA COOPERATIVE FOREIGN LANGUAGE TESTS: 

Tests with a New Look and a New Purpose 



by Miriam M. Bryan 
Educational Testing Service 
Princeton, New Jersey 



As recently as a decade and a half 
ago, most of our foreign language in- 
struction in both, schools and colleges 
was aimed at the development of 
competence in reading, and the stand- 
ardized tests used to measure language 
proficiency were essentially reading 
tests. Today, although competence in 
reading is still an important objective, 
our foreign language instruction is 
aimed at the development of compe- 
tence in all four language skills: listen- 
ing, speaking, reading, and writing. 
Just as the emphasis on the develop- 
ment of complete language competence 
has led to the development of new 
instructional procedures and new learn- 
ing materials, so, inevitably, it has led 
to the development of new types of 
testing materials to measure the prod- 
ucts of the new approach. The story 
of the development of these new test- 
ing materials is essentially the story 
of the development of the MLA Coop- 
erative Foreign Language Tests. 

In 1960 the Modern Language As- 
sociation of America received a con- 
tract from the United States Office 
of Education under Title VI of the 
National Defense Education Act for 
the development of a series of tests 
suitable for the evaluation of language 
learning by the audio-lingual approach. 
The amount of the contract was more 
than half a million dollars. 

Major specifications for the tests 
were as follows: 

1. Tests should be developed in 
the five languages most fre- 
quently taught in American 
secondary schools and colleges 
— French, German, Italian, 
Russian, and Spanish. 

2. Tests should measure the four 
language skills — listening, 
speaking, reading, and writing. 

3. Tests should measure skills on 
two levels, the lower level cor- 



responding to the first and sec- 
ond year of language learning 
in the secondary school or to 
the first and second semester of 
language learning in college, 
and the higher level corres- 
ponding to the third and fourth 
year of language learning in 
the secondary school or to the 
third and fourth semester of 
language learning in college. 

4. There should be two forms of 
each test at each level. 

To assure that the tests would dem- 
onstrate the best practices in language 
teaching, the task of planning and 
writing them was assigned by the 
MLA to committees of outstanding 
foreign language teachers. Professor 
Nelson Brooks of Yale University was 
appointed MLA director of the testing 
program. Education Testing Service 
was invited to furnish the testing 
“know-how.” Mr. Donald D. Walsh, 
director of the Foreign Language 
Program for the MLA, assumed over- 
all responsibility for the program. 

So as to obtain the broadest possi- 
ble representation, the committee mem- 
bers were drawn from all parts of the 
country, from secondary schools and 
colleges, from public and private in- 
stitutions. Twenty committees were 
formed, one for each skill in each 
language. Each committee consisted 
of three members, preferably one col- 
lege teacher, one public school teacher, 
and one private school teacher. On 
most committees at least one member 
was a native speaker of the language 
to be tested. 

It is important for foreign language 
teachers to know about these com- 
mittees because it is important for 
them to be assured that these tests 
were not developed by “testers” and 
not by novices at foreign language 
teaching, but by outstanding members 
of the foreign language teaching pro- 



fession under the direction of the high- 
est ranking professional organization 
in the field. ETS foreign language 
specialists served as consultants to the 
test committees through the test 
planning and item writing stages, as- 
sisted the test committees in the in- 
terpretation of item analyses and in 
the assembly of test forms, and planned 
and conducted the pretesting and 
standardization programs. 

The chairmen of the twenty com- 
mittees formed an advisory council, 
which met in the fall of 1960, to dis- 
cuss and draw up the broad outlines 
of the program. The committees then 
met in skill groups to put together test 
specifications for each skill separately, 
to draw up content outlines for the 
tests and to begin the actual test con- 
struction. They concluded the major 
portion of their work three years 
later, in the spring of 1963, when final 
forms of the tests were approved for 
standardization. 

The tests are designed to measure 
competence in all four language skills 
in a functional context. In the listen- 
ing tests, the students listen; in the 
speaking tests, they speak; in the read- 
ing tests, they read; and in the writing 
tests, they write. Except for direc- 
tions, which are given in English, the 
tests are entirely in the language 
tested. They contain only complete 
and natural utterances. Isolated vo- 
cabulary, artificial phrases, and wrong 
forms are avoided. All taped material 
is spoken by native voices. The tests 
in all four skills present a wide va- 
riety of contexts, with gradual pro- 
gression from the very simple to the 
very difficult. While the tests were de- 
signed to fill the need for evaluative 
instruments in schools using the audio- 
lingual approach, they can be used in 
any school that has the equipment 
needed to administer them. From the 
most traditional school, in which only 
the reading tests or the reading and 
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writing tests may be appropriate to 
the most progressive school, where 
tests in all four skills are appropriate, 
the tests are useful in varying de- 
grees. 

LISTENING: In the listening test, 
which is received through earphones, 
the student is required to answer 
multiple-choice questions based on 
single utterances, on short conversa- 
tions between speakers, on passages 
of connected discourse read by a 
single speaker, on telephone conver- 
sations where the examinee assumes 
the part of one of the speakers, and 
on brief dramatic scenes enacted by 
several voices, men’s and women’s. 
In the lower level tests, the first few 
questions involve visual stimuli and 
responses. The student may record 
his answers in the test booklet or on a 
separate answer sheet. 

SPEAKING: The speaking test allows 
for student reception from master 
tapes through individual recording sta- 
tions; student responses are on tape. 
The student is tested first on ability 
to repeat what is heard with the 
proper pronunciation and intonation, 
then on ability to read aloud with the 
proper pronunciation and intonation 
and with the fluency expected of the 
student at his Particular level of lan- 
guage study, then on ability to give 
single sentence responses to questions 
based on picture stimuli, and, finally, 
on ability to describe pictures pre- 
sented singly and in sequences. The 
pictures in the speaking tests are the 
same for all five languages. 

READING: The reading test is a pen- 
cil and paper test of the multiple- 
choice type. The first part of the test 
is aimed at testing knowledge of high 
frequency words and idiomatic ex- 
pressions; in this part the questions 
are presented in sentence completion 
form. The second part includes read- 
ing passages of varying length with 
multiple-choice questions which test 
word or phrase discrimination getting 
the main idea, finding details, and 
drawing conclusions. Materials for 
both sentences and passages are from 
newspapers, periodicals, and literary 
works within the ability of students to 
comprehend. The student may record 
his answers in the test booklet or on 
a separate answer sheet. 

WRITING: The writing test is a pen- 
cil and paper test A the subjective 
type. The first part of the test is aimed 
at testing such elements as articles, 
prepositions, pronouns, and auxiliaries 
at the sub-sentence level with items of 
the fill-in type. In the next part the 



student is asked to rewrite sentences, 
making changes of tense, gender, num- 
ber, person, word order, and sentence 
structure, the type of change being in- 
dicated by clue or example. Finally, 
the student writes short dialogues or 
structured paragraphs based on ver- 
bal stimuli. 

The listening and reading tests are 
objectively scored, whether the an- 
swers are recorded in the test booklet 
or on a separate answer sheet. The 
speaking and writing tests require sub- 
jective scoring. 

The test development program in- 
volved so many departures from tra- 
ditional ways of testing language 
achievement that an extensive pre- 
testing ) urogram and a most compre- 
hensive standardization program were 
necessary. As a preliminary step, in 
the spring of 1961, one test in each skill 
in each language was prepretested in 
a small number of schools, all using 
the audio-lingual approach, so that 
the committees could see if the tests 
were working in the right direction. 
When the results of v,his prepretesting 
were analyzed, the committees rede- 
signed test specifications as they felt 
necessary and proceeded to develop 
three forms of each test for each level, 
a total of 120 tests. 

These tests were pretested in the 
spring of 1961 in 100 public and 
private secondary schools possessing 
the facilities for administering all 
tests. Approximately 40,000 tests were 
administered to 10,000 students, each 
student taking tests in all four skills 
so that the relative difficulty of the 
tests from skill to skill could be de- 
termined. A detailed item analysis was 
based on the results of the pretesting. 
After examining the item analysis 
data, the committees discarded the 
one third of the items which proved to 
be least effective and assembled 80 
final forms, in which only items of the 
highest statistical validity and of the 
proper difficulty were retained. 

Approximately 80,000 final forms 
were administered in the spring of 
1963 in a norming and equating 
program involving over 20,000 stu- 
dents in more than 400 secondary 
schools and over 100 colleges. The 
schools and colleges participating in 
the standardization program were ran- 
domly selected for participation after 
having previously supplied exhaustive 
information about their language pro- 
grams which made it possible to 
classify them as audio-lingual or tra- 
ditional — proportion of class time in 



which the target language was used, j 

amount of translation from the target I 

language into English and vice versa, i 
proportion of time used for the expla- 
nation of grammatical constructions j 
in English, and the like. j 

As a result of this large-scale test- 1 
ing, it has been possible to provide for J 
the lower level tests norms for tests \ 
in all four skills separately for high \ 
school students who have been taught 1 
by the audio-lingual approach, and for j 
tests in reading and writing norms for f 
high school students who have fol- \ 
lowed a traditional program of instruc- | 
tion. For the higher level forms, only 1 
general high school norms are pro- j 
vided, the reason for this being that ’J 
at the upper grade levels the results j 
of differences in instructional ap- 1 
proach are not so clearly pronounced. 1 
For the same reason, all college norms I 
are general norms. Norms are avail- | 
able for high school classes at all | 
levels of instruction except for fourth 1 
year Italian and Russian at the high | 
school level, and for the speaking test | 
in German at this same level. In spite | 
of a thorough combing of the country | 
for cases for these languages, working \ 
through the ML A and through state j 
supervisors of foreign languages, it 1 
was not possible to locate enough 
fourth year cases to permit the pro- j 
duction of reliable data for this level. \ 
With a single exception, norms are j 
available for second and fourth semes- | 
ter classes at the college level for all l 
tests in all languages; the exception | 
is the speaking test in Italian, for jj 
which it was not possible to locate \ 
enough cases to permit the develop- j 
ment of norms. } 

The equating resulting from the 
standardization was both horizontal j 
and vertical, i.e., Form A was equated \ 
to Form B at each level and the lower j 
level tests to the higher level tests. As 1 
a result of the equating, a score on a j 
test at one level can be converted to a ’ 
score on the same test at the other I 
level. The ability to perform such a \ 
conversion is essential when scores on • 
different test forms at the same or at \ 
different levels are being compared. ) 

The most unusual tests are, of j 
course, the speaking tests. The speak- j 
ing tests represent the first attempt to ] 
develop standardized tests of this skill j 
for use with secondary school students \ 
and in first and second year college j 
classes. As might be expected, they J 
follow closely the plan for the speak- j 
ing tests developed earlier for teachers ; 
in the series of MLA Foreign Lan- ; 
guage Proficiency Tests for Teachers j 
and Advanced Students. The test j 
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TESTS (Cont.) 

tapes for the speaking tests are in two 
parts. In the first part the test is pre- 
sented to the students. In the second 
part an actual student recording is 
presented to help the teacher in the 
rating of the tapes. 

All the student tapes recorded in 
the pretesting and standardization pro- 
grams were scored in the MLA-ETS 
scoring center by specially trained 
professional scorers, who 'were able to 
achieve a moderately high degree of 
inter-scorer reliability with recorded 
responses. While professional scoring 
services are available in the MLA- 
ETS scoring center for tapes recorded 
in local or institutional testing pro- 
grams, they are not widely used be- 
cause of their costliness. Teachers who 
score their own tests completely so 
that they can compare the t ?st results 
of their students with those of the 
norms groups can do a maximum of 
four tapes per hour. Or teachers may 
prefer simply to spot-check the stu- 
dent tapes and give a broad grade like 
A, B, C, D, or Excellent, Good, Fair, 
Poor. Some teachers have scored their 
tests either completely or by spot- 
checking and then sent a small num- 
ber of tapes to the MLA-ETS scoring 
center for re-scoring as a check on 
their own scoring. 

The writing tests administered in 
the pretesting and standardization 
programs were also rated by the pro- 
fessional scorers in the scoring center. 
Because of the difficulty of scoring 
any kind of writing test with any 
degree of reliability, very detailed di- 
rections for the scoring of these tests 
were worked out for the scorers. Since 
the tests are highly structured in spite 
of the fact that they are written tests 
— and the writing is therefore rated 
more for correctness than for creativ- 
ity, the professional scorers were able 
to achieve an inter-scorer reliability 
for the writing tests almost as high as 
that expected for an objective test. 
While scoring services for the writing 
tests are available in the MLA-ETS 
scoring center, the cost is again so 
high that most scoring is being done 
locally. This can be quite easily done 
because of the detailed directions for 
scoring that are provided — including 
sample student compositions rated by 
professional scorers and accompanied 
by detailed descriptions of the reasons 
for the ratings assigned. Teachers who 
score their own writing tests and com- 
pare the results of their students with 
those of the norms groups find that 
they are able to rate from four to six 
tests an hour. 

Here are a few impressions of the 



tests and their characteristics, based 
originally on the results of the stand- 
ardization program and reinforced by 
test results reported more recently by 
test users. 

1. The tests are of greater than 
middle difficulty. This, it should 
be said, was intended. With the 
remarkable progress being made 
today in the teaching of foreign 
languages at the secondary school 
level and with the extension of 
foreign language instruction to 
the elementary grades, the spon- 
soring agencies and the commit- 
tees were afraid that tests of 
middle difficulty would soon be- 
come too easy to provide adequate 
measures of achievement. 

2. The tests are well adapted to 
the various levels of instruction. 
Steady increase in score is shown 
in the norms as the students 
progress from year to year of 
instruction and from lower to 
higher level form. 

3. While for most languages the 
students in traditional programs 
do show some superiority in 
the reading and writing skills, 
the superiority is not so marked 
as critics of the audio-lingual 
method expected they might be. 
As a matter of fact, the superior- 
ity is so slight that no mention is 
made of it in the handbook that 
accompanies the tests. 

4. Students with two years of high 
school instruction and one year 
of college instruction do about 
equally well on the tests, but stu- 
dents with four years of high 



school instruction do slightly bet- 
ter than students with two years 
of college instruction. This is no 
doubt attributable to a great de- 
gree to the fact that four years 
of high school instruction is, as a 
rule, offered in superior high 
schools and the best and most 
serious students of the language 
take the fourth year of instruc- 
tion. The same level of ability 
and seriousness of purpose are 
not always present in college stu- 
dents completing a required sec- 
ond year of language study. 

These, then, are the MLA Coopera- 
tive Foreign Language Tests — the 
tests with the “new look,” which are 
apparently fulfilling most efficiently 
the “new purposes” that they were 
created to serve. They are currently 
being widely used in public secondary 
schools and in independent schools as 
measures of achievement, and they are 
being used in a large number of col- 
leges and universities for placement 
purposes and end-of-course testing. 
They are also serving at all levels of 
foreign language instruction as models 
for teacher-made tests. 

Their enthusiastic acceptance by 
the thousands of foreign language 
teachers who have used the MLA Co- 
operative Foreign Language Tests is 
indeed a fine tribute to the foreign 
language teaching profession as a 
whole — and especially to the profes- 
sional organization which sponsored 
the development of the tests and the 
members of the profession who planned 
and created them. 



English teaching should pay more 
attention to the speaking, listening to, 
and creative uses of language for 
young people at all levels. These were 
among the recommendations of a 
unique conference, the Dartmouth 
Seminar, which brought together 50 
scholars and specialists in the teach- 
ing of English in Anglo-American 
countries for a month-long study. Par- 
ticipants generally agreed that if there 
is a “new English,” it is to bp found 
in reexamining and reinterpreting a 
child’s experiences in language, not by 
introducing new content. Teachers 
should say less and children more in 
English classrooms, and there should 
be many opportunities for creativity, 
e.g., creative dramatics, imaginative 



writing, improvisation, and role play- 
ing. The Seminar criticized the rigid- 
ity of “grouping” or “streaming,” as 
these “limit the linguistic environment 
in which boys and girls learn English.” 
Also, present examination patterns 
direct the attention of both teachers 
and pupils “to aspects of English 
which are at best superficial and 
often misleading.” The Seminar was 
financed by the Carnegie Corporation 
and cosponsored by the Modern 
Language Association, the National 
Council of Teachers of English, and 
Great Britain’s National Association 
for the Teaching of English. 



— Education U.S.A., October 20, 1966 



